Distance-d covering problems in scale-free networks with degree correlations 
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A number of problems in communication systems demand the distributed allocation of network 
resources in order to provide better services, sampling and distribution methods. The solution to 
these issues is becoming more challenging due to the increasing size and complexity of communication 
networks. We report here on a heuristic method to find near-optimal solutions to the covering 
problem in real communication networks, demonstrating that whether a centralized or a distributed 
design is to be used relies upon the degree correlations between connected vertices. We also show 
that the general behef that by targeting the hubs one can efficiently solve most problems on networks 
with a power law degree distribution is not valid for assortative networks. 



The allocation of network resources to satisfy a given 
service with the least use of resources, is a frequent prob- 
lem in communication networks. For instance, a highly 
topical problem is the developing and deploying of a dig- 
ital immune system to prevent technological networks 
from virus spreading. In this case, it is worthwhile to 
characterize whether a centralized organization or a dis- 
tributed approach is the best choice 1]. Clearly, this is 
the first decision, and perhaps one of the fundamental 
ones, that must be taken before proceeding with other 
technical issues. Another natural ground includes the 
placement of web mirror servers. The solution to such 
problems is becoming more challenging due to the in- 
creasing size of social and technological networks. Heuris- 
tic approaches that provide hints and pave the way for 
more elaborated strategies would be welcome. For this 
purpose we must identify which individuals are the ideal 
candidates to transmit, collect, monitor orprevent infor- 
mation and virus spreading across the net [fljISjyjIjjISlla- 

The solution to this and similar problems may be com- 
putationally easy or hard depending on the top ological 
properties of the underlying graph jll, |^ Isl, lnJ, [llj . In 
particular, communication and many other real-world 
graphs are characterized by wide fluctuations in the ver- 
tex degrees [l^, llJ, 13 , where the degree of a vertex is 
the number of edges attached to it. This means that, 
in addition to a high number of small degree vertices, 
there are hubs connected to a large number of other ver- 
tices. The existence of hubs has been exploited to de- 
velop strategies aimed at enhancing network resilience to 
damage ^1 , virus spreading ^ ^, 6] , and searching algo- 
rithms |5| . Additionally, real- world networks are charac- 
terized by degree correlations between connected vertices 
[la, ll^l . These degree correlations have been shown to 
affect the computational complexity of hard problems on 
graphs with wide fluctuations in the vertex degrees \U\ . 

We report here on a heuristic method that allows us 
to find near-optimal solutions to the covering problem 
in real-world networks. Specifically, we are interested in 
the problem of computing the minimum set of covered 



vertices (referred to henceforth as servers) such that ev- 
ery vertex is covered or has at least one covered vertex 
at a distance at most d (distance-d covering problem), 
where the distance between two vertices in the graph is 
the minimum number of hops necessary to go from one 
vertex to the other. Each server will then provide service 
to or monitor those vertices within a distance d. Using 
a heuristic algorithm that targets high degree vertices, 
we compute an upper bound to the minimum fraction of 
servers needed to cover these graphs. We find out that 
the solution to the distance-d covering problem strongly 
depends on the degree of similarity between the con- 
nected vertices. As a consequence, we show that when 
designing networked systems, whether a centralized or 
distributed design is to be used relies upon the network 
properties at a local level. Our primary intent is not to 
develop an optimal algorithm. Instead, our main focus 
is in assessing the impact of correlations on the design 
of networked systems, and hence provide motivations, or 
lack thereof, for moving to more complex heuristics in 
the context of covering problems in real nets. 

The communication networks considered in this work 
are the following. AS: Autonomous system level graph 
representations of the Internet as of April 16th, 2001. 
Gnutella: Snapshot of the Gnutella peer to peer net- 
work, provided by Glip2 Distributed Search Solutions. 
Router: Router level graph representations of the Inter- 
net. All these graphs are sparse with an average de- 
gree around 3, small worlds |l7l | with an average distance 
between vertices less than 10, and they are character- 
ized by a power law degree distribution pk ~ k~'^ , with 
7 « 2.2. A detailed characterization of these grap hs is 
presented in Refs. (Gnutella) and [H M^ (AS 
and Router graphs). They differ, however, in their de- 
gree correlations between nearest neighbor vertices. The 
AS and Gnutella graphs exhibit disassortative degree cor- 
relations, with a tendency to have connections between 
vertices with dissimilar degrees (Fig. ^). In contrast, 
the Router graph displays assortative degree correlations, 
with a tendency to establish connections between vertices 
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FIG. 2: Average fraction of servers (a;) needed to cover a 
graph under the constraint that a vertex should have a server 
at most at a distance d = 1, using the leaf-removal (circles) 
and local (squares) algorithms, as a function of the exponent 
vi defined in Fig. with negative and positive values cor- 
responding to disassortative and assortative graphs, respec- 
tively. 



FIG. 1: Average degree < K^"^' >k of the distance-d neighbors 
of a vertex with degree k, for d = 1 (circles), d = 2 (squares), 
d = 3 (diamonds) and d = 4 (triangles]. Note that the average 
neighbor degree introduced in Ref. |l5| corresponds with < 
J^(^' >fc. (a) < K^"^^ >k vs k for the AS graph. The inset 
shows the exponent Vd obtained from the best fit to the power 
law < K^ ' >k= Ak"'' in the range A: > 1. Similar results are 
obtained for the Gnutella graph, but with more fluctuations 
due to its small size. (b)< K^'''' >k vs k for the Router graph. 
The inset shows the exponent Vd obtained from the best fit to 
the power law < K'-'^'' >k= Ak"'' in the range 10 < fc < 100. 



with similar degrees (Fig. QJd). In this work we are in- 
terested in covering problems beyond d = 1, therefore we 
also analyze the degree correlations for d > 1 |2l|. For 
the disassortative graphs, the average degree of distance- 
d neighbors < K^'^'> >k, restricted to root vertices with 
degree fc, follows the same trend as < K'-^^ >k, tending 
to be less correlated for larger d (Fig. QJi). For the as- 
sortative graph, however, the degree correlations are as- 
sortative up to rf = 2, becoming disassortative for d > 2 
(Fig. ^). Finally, for d > 6 the degree correlations in 
the originally assortative graph show a similar trend than 
in the disassortative graphs. 

We propose the following heuristic algorithm to obtain 
an upper bound to the distance-d covering problem. Lo- 
cal algorithm: For every vertex in the graph, cover the 
highest degree vertex at a distance at most d from the 
vertex. In case there is more than one vertex with the 
highest degree, one of them is selected at random and 
covered. To test this algorithm we first consider the case 
d = 1, known as the dominating set problem i7'|. In this 



case we can use a leaf-removal algorithm as a reference 
method, which yields a nearly optimal solution together 
with an error estimate |23|. The leaf-removal algorithm 
is defined as follows. To each vertex i we assign two state 
variables Xi and j/i, where Xi = {xi = 1) if the vertex 
is uncovered (covered) and j/i = {yi = 1) if the ver- 
tex is undominated (donnnated). Here a vertex is said 
to be dominated if it has at least one neighbor covered. 
Starting with all vertices uncovered and undominated 
{xi = yi = for all i), iteratively, (z) select a vertex 
with degree one (leaf). If it is not dominated, cover its 
neighbor, set dominated its second neighbors, and then 
remove the leaf, its neighbor, and all their incident edges, 
(ii) If no vertex with degree one is found, then cover the 
vertex with the larger degree (hub), set dominated its 
neighbors, and then remove the hub and all its incident 
edges. Finally, if some vertices with degree zero remain, 
they are covered if they are not dominated, and removed 
from the graph. Since step (i) always provides an opti- 
mal solution, the error in computing the average fraction 
of covered vertices (x) — J2i=i ^i/^ is less than or equal 
to the fraction of vertices covered applying step (ii). 

The comparison between the local and leaf-removal al- 
gorithms is shown in Fig. El First, notice that the solu- 
tions obtained with the leaf-removal algorithm are almost 
exact for the networks considered here and d = 1. The 
local algorithm yields satisfactory, though non-optimal, 
solutions to the covering problem, with some differences 
depending on correlations between connected vertices. 
For the AS and the Gnutella graphs, which exhibit dis- 
assortative degree correlations, the local algorithm gives 
a good estimate, quite close to the optimal one for the 
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FIG. 3: (a) Average fraction of servers {x) covering the graph 
for different values of d.The continuous lines are the best fits 
to an exponential decay, (b) Average fraction of vertices (n) 
served by a server for different values of d. The inset shows the 
graph size dependence of (n) for the AS graph and d = 1,2. 



AS graph. In contrast, for the Router graph we observe 
a larger deviation from the optimal solution. The origin 
of this difference is due to the fact that the local algo- 
rithm exploits the degree fluctuations among connected 
vertices. Indeed, these fluctuations are bigger in disas- 
sortative graphs as connected vertices likely have differ- 
ent degrees. In contrast, in assortative graphs, although 
there may be high degree fluctuations between two ver- 
tices selected at random, connected vertices tend to have 
similar degrees, resulting in poorer solutions. These re- 
sults indicate that the general belief that heuristic al- 
gorithms targeting the hubs may be sufficient to solve 
computational problems on graphs with wide degree fluc- 
tuations may not be the case for assortative graphs. 

The d = 1 covering problem results in a distributed 
architecture because a finite fraction of the vertices is 
covered. Let us now extend the method and discuss the 
results obtained with the local algorithm for the more 
general and complex problem d > 1. In Fig. |3^ we show 



that, with increasing d, the average fraction of servers 
decays exponentially fast, indicating that if we allow the 
servers to be more distant, a substantial decrease in the 
number of required servers is obtained. This exponen- 
tial decay is a consequence of the small-world property 
of these networks, characterized by an average distance 
between vertices that grows as or slower than the loga- 
rithm of the number of vertices. The decrease in (x) is, 
however, achieved at the expense of an increase in the 
average fraction of vertices {n) covered by a server (Fig. 
^fl). This is a key metric as it marks the trade-off be- 
tween the number of servers needed and their capacity. 

Again, a remarkable difference depending on the graph 
assortativities is appreciated. For the Gnutella and AS 
graphs, with disassortative correlations, (n) increases sig- 
nificantly from d = 1 to d — 2. Indeed, a finite size study 
for the AS graph, with a growing tendency from 1997 
to 2002 [13, reveals that (n) decreases to zero with in- 
creasing the graph size for d = 1 , while it remains almost 
constant for d = 2 or larger (see inset of Fig. O)). On 
the other hand, in the Router graph, with assortative 
correlations, (n) increases much slower with increasing 
d, being almost zero up to d = 3 (Fig. |3Jd). These re- 
sults are the signature of a phase transition. There is 
a threshold distance dc such that the average fraction 
of vertices served by a covered vertex is very small for 
d < dc, going to zero with increasing A^, while it is finite 
for d > dc- For disassortative graphs dc = 1 while for as- 
sortative ones dc > 1- Note that the value dc ~ 3 for the 
Router graph coincides with the distance where the de- 
gree correlations become disassortative, indicating that 
the phase transition is determined by the change in the 
degree correlations. Furthermore, this transition gives a 
practical measure to get the desired trade-off between (x) 
and (n). 

Since the graphs considered here are characterized by 
wide fluctuations in the vertex degrees, we have also com- 
puted the average number of covered vertices {n)i^, re- 
stricted to vertices with the same degree k. In all cases 
we observe an increasing tendency of (n)^, with k, as it 
is expected from the deflnition of the local algorithm, 
which targets high degree vertices. Two distinct behav- 
iors are once again observed depending on the degree 
correlations. In the disassortative graphs, (n)j, is already 
as large as 10% of the vertices for d = 2 and k > 10 (Fig. 
^). In contrast, in the assortative graphs, only beyond 
d = 4, one observes that large value of (n)^. 

The striking differences between disassortative and as- 
sortative correlations have important consequences re- 
garding how resources are allocated. For disassortative 
graphs, except for the case d = 1, one would need servers 
with a vast capacity, covering a large fraction of vertices. 
The most efficient strategy is, therefore, the allocation 
of resources in a few servers with a large capacity. The 
scalability of the server system would in this case be de- 
termined by the single server capacities, which should be 




FIG. 4: Average number of covered vertices (servers) {n)^, 
restricted to vertices with the same degree k for several values 
of d. The figures show that for disassortative graphs (a), the 
servers should have a large capacity to serve a finite fraction 
of the graph even for small to moderate values of d. On the 
contrary, for assortative graphs (b), the fraction of servers is 
a negligible fraction of A^ up to large values of d. 



increased as the graph size grows. In the assortative case, 
we have a different scenario. The decrease of the number 
of servers with increasing d is not as dramatic as for the 
disassortative graphs. In compensation, each server cov- 
ers a small fraction of vertices. Hence, the most efficient 
strategy is to allocate the resources in a large number of 
servers with a limited capacity. The scalability of the sys- 
tem would be driven by the number of required servers, 
which augments with increasing the graph size. In turn, 
regarding the design of communication networks, we can 
decide between disassortative or assortative topologies 
depending on the available resources. A disassortative 
topology will be more appropriate for a centralized de- 
sign, with a few servers having a large capacity, while an 
assortative network will be best suited for a distributed 
design, when a large number of servers have a limited 
capacity. 

It is worth stressing that the heuristic proposed is 
based on a local knowledge of the network (only requiring 
information about the graph topology up to a distance 
d), a key property of utmost importance for most real 
applications. Indeed, all the graphs considered here are 



incomplete representations of the systems they are aim 
to represent |23J, as it generally happens in graph repre- 
sentations of large systems. 

Finally, the present study shows that the general belief 
that by targeting the hubs one can efficiently solve most 
problems on networks with a power law degree distribu- 
tion (percolation, spreading, searching, covering, etc) is 
not valid if the degree correlations are assortative. This 
conclusion is of special relevance in the analysis of social 
systems where assortative networks are the general rule. 
Furthermore, we have shown that whether the degree cor- 
relations are assortative or disassortative may depend on 
the distance between the connected vertices, indicating 
that different strategies may be used depending on the 
characteristic distance of the covering problem. 
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